Conversation
There was a problem hiding this comment.
Pull request overview
Adds first-class support for a per-conversation unique identifier (conversation_id) on LLMInterface, while also clarifying/renaming the runner’s run-level ordinal to conversation_index and standardizing response metadata access via a last_response_metadata property.
Changes:
- Introduces
LLMInterface.conversation_id, pluscreate_conversation_id()/ensure_conversation_id(), and converts response metadata access fromget_last_response_metadata()tolast_response_metadata. - Updates built-in LLM clients + tests to use the new metadata property and to call
ensure_conversation_id()after responses/errors. - Renames runner/test parameter
conversation_id➜conversation_indexwhile keeping result key"id"for compatibility; updates docs accordingly.
Reviewed changes
Copilot reviewed 19 out of 19 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
llm_clients/llm_interface.py |
Adds conversation_id + metadata property and helper methods. |
llm_clients/openai_llm.py |
Switches to _last_response_metadata for in-place updates; calls ensure_conversation_id(). |
llm_clients/azure_llm.py |
Same as above for Azure client, including error paths. |
llm_clients/claude_llm.py |
Same as above for Claude client. |
llm_clients/gemini_llm.py |
Same as above for Gemini client. |
llm_clients/ollama_llm.py |
Calls ensure_conversation_id() and removes old metadata getter. |
generate_conversations/runner.py |
Renames conversation_id ➜ conversation_index; resets agent.conversation_id per conversation. |
generate_conversations/conversation_simulator.py |
Uses last_response_metadata property for logging metadata. |
tests/unit/llm_clients/test_* |
Updates tests to use last_response_metadata and renames copy-behavior tests. |
tests/unit/llm_clients/test_llm_interface.py |
Adds unit coverage for conversation_id helpers. |
tests/mocks/mock_llm.py |
Removes old metadata getter; calls ensure_conversation_id(). |
tests/integration/test_conversation_runner.py |
Renames conversation_id ➜ conversation_index in integration tests. |
docs/evaluating.md |
Documents the new metadata property and conversation_id flow. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 19 out of 19 changed files in this pull request and generated 6 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 19 out of 19 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| usage = metadata["token_usage"] | ||
| self.last_response_metadata["usage"] = { | ||
| self._last_response_metadata["usage"] = { | ||
| "input_tokens": usage.get("input_tokens", 0), |
There was a problem hiding this comment.
this is off-topic, but... do we save this metadata anywhere? It doesn't seem to be in the logging output for chat generation, or the logs output for judging... but if we have total token usage for each conversation and judging evaluation somewhere that we could write out, it would help... everyone with understanding costs. (This is probably a separate ticket... but only if we really aren't storing it anywhere.)
There was a problem hiding this comment.
emily-vanark
left a comment
There was a problem hiding this comment.
Couple questions / suggestions, but the tests run, and it looks okay to me. Would like @sator-labs ' opinion on it if he has time.
nz-1
left a comment
There was a problem hiding this comment.
lgtm - left one comment for consideration
| model_name=self.agent_model_config["model"], | ||
| name=self.agent_model_config.get("name", "Provider"), | ||
| system_prompt=self.agent_model_config.get( | ||
| "system_prompt", "You are a helpful AI assistant." |
There was a problem hiding this comment.
I would maybe have this as a optional arg with the default, since it seems a potentially consequential decision and it's buried here
Description
Conversation ID support and naming
conversation_idtoconversation_indexin the runner and tests. It remains the 1-based index of a conversation within a run; the result dict still uses the key"id"for compatibility (but noted to revisit to change to"index".conversation_id,create_conversation_id(), and_update_conversation_id_from_metadata().self.conversation_idat init._update_conversation_id_from_metadata()and overwrite if API returns with new conversation_id.conversation_idwhen a new persona "enters the room"get_last_response_metadata()API with alast_response_metadataproperty (backed by_last_response_metadata) so reading it always returns a deep copy. Implementations that mutate metadata in place use_last_response_metadata; the setter still accepts full dicts viaself.last_response_metadata = {...}.docs/evaluating.mdwith a “Conversation flow and history” section and clarified how to uselast_response_metadata,conversation_id, and_update_conversation_id_from_metadata()when implementing a custom LLM client.